Computing Molecular Signatures as Optima of a Bi-Objective Function: Method and Application to Prediction in Oncogenomics
نویسندگان
چکیده
BACKGROUND Filter feature selection methods compute molecular signatures by selecting subsets of genes in the ranking of a valuation function. The motivations of the valuation functions choice are almost always clearly stated, but those for selecting the genes according to their ranking are hardly ever explicit. METHOD We addressed the computation of molecular signatures by searching the optima of a bi-objective function whose solution space was the set of all possible molecular signatures, ie, the set of subsets of genes. The two objectives were the size of the signature-to be minimized-and the interclass distance induced by the signature-to be maximized-. RESULTS We showed that: 1) the convex combination of the two objectives had exactly n optimal non empty signatures where n was the number of genes, 2) the n optimal signatures were nested, and 3) the optimal signature of size k was the subset of k top ranked genes that contributed the most to the interclass distance. We applied our feature selection method on five public datasets in oncology, and assessed the prediction performances of the optimal signatures as input to the diagonal linear discriminant analysis (DLDA) classifier. They were at the same level or better than the best-reported ones. The predictions were robust, and the signatures were almost always significantly smaller. We studied in more details the performances of our predictive modeling on two breast cancer datasets to predict the response to a preoperative chemotherapy: the performances were higher than the previously reported ones, the signatures were three times smaller (11 versus 30 gene signatures), and the genes member of the signature were known to be involved in the response to chemotherapy. CONCLUSIONS Defining molecular signatures as the optima of a bi-objective function that combined the signature size and the interclass distance was well founded and efficient for prediction in oncogenomics. The complexity of the computation was very low because the optimal signatures were the sets of genes in the ranking of their valuation. Software can be freely downloaded from http://gardeux-vincent.eu/DeltaRanking.php.
منابع مشابه
A bi-level linear programming problem for computing the nadir point in MOLP
Computing the exact ideal and nadir criterion values is a very important subject in multi-objective linear programming (MOLP) problems. In fact, these values define the ideal and nadir points as lower and upper bounds on the nondominated points. Whereas determining the ideal point is an easy work, because it is equivalent to optimize a convex function (linear function) over a con...
متن کاملAPPLICATION OF TABU SEARCH FOR SOLVING THE BI-OBJECTIVE WAREHOUSE PROBLEM IN A FUZZY ENVIRONMENT
The bi-objective warehouse problem in a crisp environment is often not eective in dealing with the imprecision or vagueness in the values of the problem parameters. To deal with such situations, several researchers have proposed that the parameters be represented as fuzzy numbers. We describe a new algorithm for fuzzy bi-objective warehouse problem using a ranking function followed by an applic...
متن کاملSIZE AND GEOMETRY OPTIMIZATION OF TRUSS STRUCTURES USING THE COMBINATION OF DNA COMPUTING ALGORITHM AND GENERALIZED CONVEX APPROXIMATION METHOD
In recent years, the optimization of truss structures has been considered due to their several applications and their simple structure and rapid analysis. DNA computing algorithm is a non-gradient-based method derived from numerical modeling of DNA-based computing performance by new computers with DNA memory known as molecular computers. DNA computing algorithm works based on collective intelli...
متن کاملBenders’ decomposition algorithm to solve bi-level bi-objective scheduling of aircrafts and gate assignment under uncertainty
Management and scheduling of flights and assignment of gates to aircraft play a significant role to improve the performance of the airport, due to the growing number of flights and decreasing the flight times. This research addresses the assignement and scheduling problem of runways and gates simultaneously. Moreover, this research is the first study that considers the constraint of unavailabil...
متن کاملA New Mathematical Model for the Green Vehicle Routing Problem by Considering a Bi-Fuel Mixed Vehicle Fleet
This paper formulates a mathematical model for the Green Vehicle Routing Problem (GVRP), incorporating bi-fuel (natural gas and gasoline) pickup trucks in a mixed vehicle fleet. The objective is to minimize overall costs relating to service (earliness and tardiness), transportation (fixed, variable and fuel), and carbon emissions. To reflect a real-world situation, the study considers: (1) a co...
متن کامل